Loop control circuit for a data processor

ABSTRACT

A data processor ( 200 ) includes an operation execution unit ( 225 ) for executing instructions from an instruction memory ( 210 ) indicated by a program counter ( 220 ). A loop control circuit ( 230 ) stores respective associated loop information for a plurality of instruction loops in a register bank ( 232 ). The loop information includes at least an indication of an end of the loop and a loop count for indicating a number of times the loop should be executed. The loop control circuit ( 230 ) detects that one of the loops needs to be executed and in response to said detection, loads the loop information for the corresponding loop, and controls the program counter to execute the corresponding loop according to the loaded loop information. The loop information is initialized in response to a loop initialization instruction ( 240 ), where the initialization instruction is issued prior to and independent of a start of the loop initialized by the loop information.

FIELD OF THE INVENTION

The invention relates to a loop control circuit for a data processor, toa data processor with a loop control circuit, and to a method ofexecuting a loop in a data processor.

BACKGROUND OF THE INVENTION

The performance of processors continuously increases. This bringsfunctionality traditionally implemented using hardware in the reach ofexecution by processor under control of a suitable program. It alsoenables software-based signal processing of new functionality orexisting functionality at increased quality. An example of newfunctionality is third generation wireless communication, such as basedon the UMTS/FDD, TDD, IS2000, and TD-SCDMA standard. These systemsoperate at very high frequencies. Modems (transceivers) for 3G mobilecommunication standards such as UMTS require approximately 100 timesmore digital signal processing power than GSM. It is desired toimplement a transceiver for such standards using a programmablearchitecture in order to be able to deal with different standards and tobe able to flexibly adapt to new standards. Using conventional DSPtechnology operating at conventional frequencies could require as manyas 30 DSPs to provide the necessary performance. It will be clear thatsuch an approach is neither cost-effective nor power efficient comparedto conventional hardware-based approaches of transceivers forsingle-standards. The digital signal processing capabilities of aprocessor can be increased by using pipelining.

U.S. Pat. No. 4,792,892 describes a pipelined processor. To execute aloop control instruction, that specifies repeated execution N times of asequence of “T” instructions, the processor includes a loop circuithaving an instruction counter which counts execution of the instructionsin the loop sequence and produces an end-of-sequence signal upon eachcompletion of the loop. A register is used that refreshes the programcounter with the address of the first instruction in the loop inresponse to each end-of-sequence signal. A loop counter is used forcounting the number of completions of the loop and delivers a signalindicating the end of the loop portion of the entire program and enablesthe program counter to continue on with the rest of the program.Pipelined calculations are critical, inter alia, the arguments andresults have to be presented and read in accord with a narrowconfiguration. The disclosed pipelined processor allows a loop controlinstruction for initializing the loop to be executed a number “D”instructions before the start of the loop. The loop control circuitincorporates a counter to count the “D” instructions before triggeringexecution of the loop sequence “N” times. The known system provides morescheduling freedom for pipelined operation involving one loop.

A further way of improving the performance of a processor is to use avector processor. A vector consists of more than one data element, forexample sixteen 16-bit elements. A functional unit of the processoroperates on all individual data elements of the vector in parallel,triggered by one instruction. The conventional vector processorarchitecture is ineffective for applications that are not highlyvectorizable. For use in consumer electronics applications, inparticular mobile communication, the additional costs of a vectorprocessor can only be justified if a significant speed-up can beachieved.

SUMMARY OF THE INVENTION

It is an object of the invention to provide a processor, loop controlcircuit and method of executing a loop that better supportshigh-performance processing.

To meet the object of the invention, a data processor for executinginstructions stored in an instruction memory and which are specified bya program counter includes an operation execution unit for executinginstructions indicated by the program counter; and a loop controlcircuit operative to store respective associated loop information for aplurality of instruction loops; the loop information for an instructionloop including at least an indication of an end of the loop and a loopcount for indicating a number of times the loop should be executed;detect that one of the loops needs to be executed and in response tosaid detection, load the loop information for the corresponding loop,and control the program counter to execute the corresponding loopaccording to the loaded loop information; initialize the loopinformation in response to a loop initialization instruction, where theinitialization instruction is issued prior to and independent of a startof the loop initialized by the loop information.

According to the invention, multiple loops can be initialized where theloop initialization is independent of the start of the loop. Of eachloop at least a loop count and indication of an end of the loop (e.g. inthe form of an address of the last instruction in the loop sequence orin the form of a number of instructions in the sequence, specifying anend of the sequence relative to a start address of the sequence) arestored. In the prior art system of U.S. Pat. No. 4,792,892 a loop isautomatically started after “D” instructions have been executed sincethe loop initialization instruction. Such an approach is particularlydifficult, if not impossible, for use with more than one loop, since itmay not been known after how many instructions a second loop needs to bestarted. It should also be noted that a zero-overhead loopingimplementation is known from the R.E.A.L. DSP of Philips Electronicsthat allows multiple loops to be specified. This DSP allowspre-initialization of a loop by specifying the loop end address using aloop initialization instruction. The initiation (i.e. start) of the loopis coupled to the remaining part of the loop initialization where theloop counter is specified. Providing the loop counter automaticallyinitiates the corresponding loop. This means that starting of a loopalways requires one dedicated loop initialization/initiation instructionto be inserted into the instruction stream.

In a preferred embodiment as specified in the dependent claims 2, theloop control circuit is operative to execute a plurality of theinstruction loops in a nested form, wherein an inner loop is initializedbefore starting execution of an immediately surrounding loop. Thissignificantly reduces the overhead involved in initializing executionloops. Preferably, all the loop initialization is performed outside theoutermost loop. In this case, no instruction cycles are devoted to loopinitiation inside the nested loops. The inventors have realized that inparticular digital signal processing involves frequent execution ofusually short loops. Loop nesting of 2 or 3 levels deep occursregularly. For example, for processing an image the outermost loop mayinvolve processing of an image frame or field, where the next level loopinvolves processing of the blocks of pixels in the frame/field and thethird level may involve processing of the pixels within the block.Traditionally, the loop initialization is at the same nesting levelpreceding the start of the loop. In a program with three nesting levelswhere each loop is executed 10 times (and consequently the innermostloop is executed 1000 times), the outermost loop is initialized once,the second loop is initialized 10 times and the inner loop isinitialized 100 times. In the system according to the invention, allloops may be initialized at the highest level, before starting executionof the first loop. This implies that only three loop initializations arerequired instead of 111 times in the known systems. This also makes theloop circuit highly suitable for vector processors. Whereas it may bepossible to vectorize instructions within a loop, initialization of aloop is difficult to vectorize. Using the approach according to theinvention, the number of non-vectorized instructions in a typicalprogram can be reduced.

In itself various ways may be used to determine/indicate a start of aloop. As described in the dependent claim 3, each instruction for theoperation execution unit includes a loop start field enabling toindicate that the instruction is a first instruction of a sequence ofinstructions forming an instruction loop to be executed by the operationexecution unit. For example, one bit may be added to the regularinstructions (typically those that can occur in an instruction loop) toindicate whether or not this instruction is the start of a loop. In thisway, no indication of a start location and/or time of a loop needs to beprovided. It will be appreciated that this comes at the expense of usingat least one additional bit in the instruction. This increase ofinstruction size can be reduced by using instruction compression.

According to the measure as described in the dependent claim 4, the loopcontrol circuit is operative, in response to detecting that the loopstart field indicates a start of an instruction loop, to store anindication of a start address of the loop in the loop informationassociated with the loop. For example, the loop control circuit mayretrieve the address of the current instruction from the program counterand store it in a register. Each time the end of the loop is received(as indicated by the end information stored for the loop), the startaddress can be retrieved from the register. If so desired, the startaddress may also be stored in the form of an offset relative to the endof the loop (as indicated in the loop information), for example byindicating the number of instructions in the loop.

According to the measure as described in the dependent claim 5, the loopinformation is stored according to a sequential nesting level of theloop, where for a respective one of the nesting levels at most one loopcan be specified at each moment in time; the loop control circuit beingoperative to store a current nesting level of instructions beingexecuted; and update the nesting level in response to detecting a startof a loop by checking the loop start field; and detecting an end of aloop by comparing the program counter to the indication of the end ofthe loop stored for the loop. Using only a one-bit loop start indicatornested loops can be started, where at each nesting level there can atmost be only one loop. An indication in the start field then implicitlyindicates which loop is to be started (i.e. the loop at the next deeperlevel). Similarly, exiting a loop implies that control is returned to anext higher level (at the highest level, no loop is being executed, butnormal sequential processing (which may be pipelined and/or vectorized)takes place. Assuming that a deeper loop is represented by a highernumber, entering a loop results in incrementing the nesting level (or,similarly, the loop number) and exiting the loop results in decrementingthe nesting level.

To overcome the limitation of only being able to initialize one loop ateach nesting level, the measure of the dependent claim 6 describes thatthe loop start field enables to indicate which one of a plurality ofspecifiable loops needs to be started. For example, each loop may beassociated with a unique sequential number where the start field caninclude such a number. If the maximum number of loop nesting levels isMAX, a total of ┌²log(X)┘ bit needs to be added to the applicableinstructions.

According to the measure as described in the dependent claim 7, the loopinformation also includes an indication of a begin of the loop. Inprinciple, the indication may take any suitable form, such as anabsolute memory address or a relative memory address within anaddressable range of a memory page or relative to a known position. Inparticular, if either the loop start address or loop end address isspecified in one of those ways, the other address can be specified as anoffset relative to the specified address. Such an offset represents thenumber of instructions in the loop.

According to the measure as described in the dependent claim 8, the loopcontrol circuit is operative to detect a start of a loop by comparingthe program counter to the indication of a begin of a loop stored in theloop information. In a situation where there is no time or positionrelationship between the loop initialization instruction and the startof the initialized loop, comparing the current address (as present in orderivable from the program counter) with the start addresses of theloops as stored in the loop information. This comparison may take placeby comparing the program counter to each stored loop start address untila match is found or all loop start addressees have been compared. Thisprocess may be optimized, for example by sorting start addresses,simplifying and/or speeding the comparison process.

According to the measure as described in the dependent claim 9, the loopinitialization instruction includes a plurality of fields forinitializing loop information of a plurality of loops in one operation.Particularly if a wide memory is used, such as a memory for storing VLIWinstructions, several loops can be initialized using only oneinstruction. This reduces the overhead in loop initialization evenfurther.

To meet the object of the invention, a loop control circuit for use in aprocessor with an operation execution unit for executing instructionsindicated by a program counter is operative to store respectiveassociated loop information for a plurality of instruction loops; theloop information for an instruction loop including at least anindication of an end of the loop and a loop count for indicating anumber of times the loop should be executed; detect that one of theloops needs to be executed and in response to said detection, load theloop information for the corresponding loop, and control the programcounter to execute the corresponding loop according to the loaded loopinformation; initialize the loop information in response to a loopinitialization instruction, where the initialization instruction isissued prior to and independent of a start of the loop initialized bythe loop information.

To meet the object of the invention, a method of causing a processor toexecute instruction loops specified by a program counter includesstoring respective associated loop information for a plurality ofinstruction loops prior to and independent of a start of the loop; theloop information for an instruction loop including at least anindication of an end of the loop and a loop count; and detecting thatone of the loops needs to be executed and in response to said detection,loading the information for the corresponding loop, and controlling theprogram counter to execute the corresponding loop according to theloaded loop information.

These and other aspects of the invention are apparent from and will beelucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIG. 1 shows an exemplary program using the loop initializationaccording to the invention;

FIG. 2 shows a block diagram of the processor and circuit according tothe invention;

FIG. 3 shows an embodiment of the processor and circuit according to theinvention;

FIG. 4 shows a counter suitable for use by the loop control circuit; and

FIG. 5 shows a preferred processor in which the loop control circuit isused.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The loop control circuit according to the invention is particularlysuitable for, but not limited to, use in digital signal processors(DSPs). For digital signal processing applications frequently loops andnested loops occur with relatively few instructions in a loop andusually uninterrupted processing of a loop. Such system can benefit fromthe architecture according to the invention that reduces the number oftimes a loop initialization instruction needs to be executed. The loopcontrol circuit is also particularly suitable for pipelined processorssince it allows free scheduling of the loop initialization instructions(as long as a loop is initialized before the start of the loop). Assuch, the instruction(s) immediately preceding the start of a loop maybe used for any purpose as, for example, is best for maintaining a highfilling degree of the pipeline.

The loop circuit can also advantageously be used in a vector processor.The vector processor can be used for regular, “heavy/duty” processing,in particular the processing of inner-loops. As such, it can providelarge-scale parallelism for the vectorizable part of the code to beexecuted. However, fully exploiting this parallelism is not alwaysfeasible, as many algorithms do not exhibit sufficient data parallelismof the right form. The so-called “Amdahl's Law” states that the overallspeedup obtained from vectorization on a vector processor with Pprocessing elements, as a function of the fraction of code that can bevectorized (f), equals (1−f+f/P)⁻¹. This means that when 50% of the codecan be vectorized, an overall speedup of less than 2 is realized(instead of the theoretical maximum speedup of 32). This is because theremaining 50% of the code cannot be vectorized, and thus no speedup isachieved for this part of the code. Even if 90% of the code can bevectorized, the speedup is still less than a factor of 8. Aftervectorization of the directly vectorizable part of the code, most timeis spent on the remaining code. The remaining code can be split intofour categories:

address related instructions (e.g. incrementing a pointer into acircular buffer, using modulo addressing)

regular scalar operations (i.e. scalar operation that correspond to themain loop of the vector processor)

looping

irregular scalar operations

The loop control circuit reduces the time spent on looping and as suchcontributes to making vector processing more suitable for consumerelectronic applications, in particular mobile communication, theadditional costs of a vector processor can only be justified if asignificant speed-up can be achieved.

FIG. 1 shows an exemplary program using the loop initializationaccording to the invention. The exemplary program includes four loops,shown as N1 to N4, organized in three nesting levels. Loop N0 is thehighest level. N2 is one level deeper and N3 and N4 are two successiveloops at one level deeper. The program starts with an arbitrary numberof instructions, indicated as 101 to 109. This is followed byinitialization of all four loops, show as 110 to 113. According to theinvention, the loop initialization can be performed at any arbitrarypoint in the program, provided that it is before the starting address(in the figure: start_address) of the corresponding loop. As such thereis also no strict reason for initializing a higher level loop beforeinitializing a inner loop. In the initialization step, at least the loopcount, and an indication of the end of the loop (hereinafter referred toas loop end address) are specified. Depending on the implementation alsoan indication of the beginning of the loop may be specified, hereinafterreferred to as the loop start address. These three parameters fullyspecify each loop, so that when the start address is reached duringprogram execution the loop can be started automatically withoutrequiring any initiation instruction, i.e. a separate instruction totrigger the start of an execution of a loop. A detailed embodimentcapable of doing so will be described with reference to FIG. 3. As canbe seen FIG. 1, this principle can be applied to nested loops, and worksalso for cases where more than one loop is present at one nesting level.If no loop start address is given (either explicit or implicit) in theinitialization instruction, the trigger to start the loop can beincorporated in the first instruction of the loop, as will be describedin more detail below.

In the example given in FIG. 1, all the initialization is performedoutside the outermost loop N0. Since no instruction cycles are devotedto loop initiation inside the nested loops, the loop overhead issubstantially reduced. It is also possible to perform some of theinitialization for the inner loops inside the outer loops, but thisreduces the advantages of this invention. For nested loops, an advantageis achieved if at least one inner loop is initialized before startingexecution of an immediately surrounding loop. As indicated, preferablyall loops are initialized at the main execution level outside any loop.

FIG. 2 shows a basic block diagram of the data processor 200 accordingto the invention. The data processor 200 is capable of executinginstructions stored in an instruction memory 210. The instruction to beexecuted is specified by a program counter 220. The instruction memorymay entirely or partly (e.g. in the form of an instruction cache) beincorporated in the processor. If so desired, the instruction memory mayalso be separate from the processor. The processor includes an operationexecution unit 225 for executing the normal instructions indicated bythe program counter. Special instructions, like processor configurationinstructions may be dealt with separately. This is not part of theinvention and will not be described further. A loop control circuit 230is capable of storing respective associated loop information for aplurality of instruction loops. The loop information for an instructionloop including at least an indication of an end of the loop and a loopcount for indicating a number of times the loop should be executed. Theloop information may also include an indication of a start of the loop.The actual storage 232 (e.g. in the form of one or more register units)may be in the loop control unit 230 or connected to it. FIG. 2 shows anexemplary way of arranging the storage 232. The storage is divided inthree register banks 235, 236 and 237, for storing start addresses, endaddresses, and loop counts, respectively. In the figure, each bank canstore four values. Shown are 241, 242, 243, and 244 for the startaddresses, 251, 252, 253, and 254 for the end addresses, and 261, 262,263, and 264 for the loop counts. As such, in this example a maximum offour loops can be initialized at each moment in time. The loop controlunit is able to identify the values for one loop (for example forinitialization of the values and for use of the value for executing aloop). The values of one loop of the respective loops may, for example,be indicated by a loop number. For example, loop no. 0 includes thevalues 241, 251, and 261; loop no. 2 includes the values 242, 252, 262,etc. The loop control circuit is able to detect that one of the loopsneeds to be executed. Below, several ways of detecting this will bedescribed in more detail. In response to detecting that a loop needs tobe started, the loop control circuit is able to load the loopinformation for the corresponding loop, and control the program counterto execute the corresponding loop according to the loaded loopinformation. In this respect, the loop control circuit acts the same asknown loop control circuits and this aspect will not be described inmore detail. According to the invention, the operation control unit 230is able to initialize the loop information in response to a loopinitialization instruction, shown as 240. The loop control unit ensuresthat the supplied information is stored in the appropriate storagelocation of the storage 232 for use at a later moment. Theinitialization instruction must be issued prior to and is independent ofa start of the loop initialized by the loop information. The loopinitialization instruction may be loaded from the instruction memory 210under control of the program counter 220. An instruction decode unit(not shown) may supply the information in the instruction to the loopcontrol unit instead of providing the instruction to the execution unit230.

To further illustrate the invention, the instruction sequence for aconventional zero-overhead loop processor, such as the Philips R.E.A.LDSP, is shown in the left column of the following table (table 1),whereas the instruction sequence according to the invention is shown inthe right column: TABLE 1 loop 1 init loop 1 init loop 1 body { loop 2init instr 1-1 loop 3 init : loop 1 body { loop 2 init instr 1-1 loop 2body { : inst 2-1 loop 2 body { : inst 2-1 loop 3 init : loop 3 body {loop 3 body { inst 3-1 inst 3-1 :  : } } : : } } : : } }

As indicated above, the loop initialization instruction provides atleast the loop count, and a loop end address. For the loop controlcircuit to determine that a loop should be started, each instruction forthe operation execution unit includes a loop start field enabling toindicate that the instruction is a first instruction of a sequence ofinstructions forming an instruction loop to be executed by the operationexecution unit. In practice all instructions may have such a loop startfield to maintain a consistent instruction structure for allinstructions. However, it will be appreciated that this is not required.For example, certain instructions may only be used for configuring aprocessor and not be suitable for use within a loop. In principle, suchinstructions do not need the field. In a simple form, the loop startfield may be a one-bit field in the instruction. A pre-determine value(e.g. binary ‘1’) may be used to indicate that the instruction is afirst instruction of a loop, whereas the other binary value (e.g. ‘0’)is used for all instructions in the sequence that are not the firstinstruction of the loop. In the next table to the left for eachinstruction an exemplary start field value is indicated. TABLE 2 0 loop1 init 0 loop 2 init 0 loop 3 init   loop 1 body { 1 instr 1-1 0 :  loop 2 body { 1 inst 2-1 0 :   loop 3 body { 1 inst 3-1 0  :   } 0 :   }0 : }It will be appreciated that also other encodings of the field arepossible as long as the loop control circuit can determine that aninstruction is a first instruction in a loop. Preferably, in response todetecting that the loop start field indicates a start of an instructionloop, the loop control circuit 230 stores an indication of a startaddress of the loop in the loop information 232 associated with theloop. In itself any suitable indication may be stored, for example usinga full absolute address, using a relative address within an addressablerange (so relative to the beginning of the range), or using an addressrelative to the end address of the loop (e.g. using a count of thenumber of instructions in the loop).

Using only a one-bit start field it is possible to support multiplenested loops, as was illustrated in table 2. A limitation is that onlyone loop can be specified at each nesting level of the loop. Referringto FIG. 1 it would not be possible to have two successive loops N2 andN3 at the same nesting level, since the one-bit indicator can notdistinguish between the two loops at the same level. With thislimitation, it is additionally required that the loop control circuitknow the nesting level of a loop. This can be achieved in a simple way,for example, by letting the loop number represent the nesting level (asequentially higher loop number indicates a deeper loop). The loopcontrol circuit stores a current loop no./nesting level of instructionsbeing executed, for example in a register. Assuming the indicatedsequential ordering of loops/nesting levels, the loop control circuitincrements the current loop no./nesting level in response to detecting astart of a loop. As described above, it may detect the start of a loopby checking the loop start field of the instruction to be executed nextby the processor. In response to detecting an exit of the loop, the loopcontrol circuit decrements the current loop no./nesting level. The loopcontrol circuit can detect an end of a loop by comparing the programcounter to the stored end address of the current loop indication. Anexit of a loop occurs if the end of the loop is detected and the loophas been executed according to the stored loop count.

In a further embodiment according to the invention, the loop start fieldenables to indicate which one of a plurality of specifiable loops needsto be started. For example, by specifying a loop number in eachinstruction the loop control circuit can, by determining a change inloop number between two successive instructions, that a new loop isentered or exited. The main execution level (not part of any loop) mayfor example be indicated using level 0 (zero). All other loops may benumbered in the sequence they appear in the program, but this is notrequired; any sequence is in principle allowed. For a program with threeloops a distinction between the three loops and the main level must bemade, this requires two bits. In table 3 to the left for eachinstruction an exemplary 2-bit start field value is indicated. The leftcolumn shows the working for three nested levels, whereas the rightcolumn shows it for two nesting levels, with two successive loops atlevel 2. TABLE 3 00 loop 1 init 00 loop 1 init 00 loop 2 init 00 loop 2init 00 loop 3 init 00 loop 3 init  loop 1 body {  loop 1 body { 01instr 1-1 01 instr 1-1 01 : 01 : 10 loop 2 body { 10 loop 2 body { 10inst 2-1 10 inst 2-1 10 : 10 : 11 loop 3 body { 10 : 11 inst 3-1   } 11 : 11 loop 3 body {   } 11 inst 3-1 10 : 11  :   }   } 01 : 01 : } }

FIG. 3 shows a block diagram for a preferred embodiment of thezero-overhead loop (0 OHL) unit inside the program controller accordingto the principles explained with reference to FIG. 1. The only primaryinput of the 0 OHL unit is the loop instruction 300. This instructionconsists of the loop-related part of the complete instruction flow, andwhen no loop instruction is present the signal loop_instruction equalsto no-operation (NOP). When a loop initialization instruction is issued,the input signal loop_instruction specifies loop count, start addressand end address. The preferred zero-overhead loop hardware includes twoaddress register units (in the figure: START ADDRESS UNIT 310 and ENDADDRESS UNIT 320), a loop counter unit 330, a loop control unit 340, andthree comparator units 350, 360, and 370. The hardware supports M loops,i.e. the maximum nesting level is M when each nesting level containsonly one loop. Consequently, the start and end address units 310, 320have M registers for storing the loop start and end addresses for eachloop. Also, M loop counters are included in the loop counter unit 330.When a loop initialization occurs, the loop parameters (start address,end address and loop count) are written into the matching registers. Theloop instruction contains an indication of the loop being initialized,preferably in a form directly convertible to the register_select signal(and counter_select signal for the loop counter unit). The loop controlunit 340 uses this information to select the matching register via theregister_select signals and counter_select signal. The respectiveregister values and counter value are provided via the respective inputsignals. The respective write_enable signals and set_counter signal areused for controlling the writing of the register/counter value to theindicated register/counter field.

The current loop is defined as the most recent loop the program hasentered. The loop control unit 340 uses the current loop pointer 342 forgenerating the signal register_select, which selects the loop parametersfor the current loop. The respective comparators 310 and 320 at theoutput of the start and end address units are responsible for comparingthe program counter 380 value to the values already stored in theseunits. The comparator may compare all M register values of its registerunit to the current value of the program counter in parallel. If itdetects a matching value, the comparator indicates equality. When morethan one start address value matches to the program counter, the currentloop is determined by taking the loop corresponding to the smallest endaddress as the current loop. When more than one end address valuematches the program counter, the loops are treated in an order startingfrom the current loop. In a preferred embodiment, the loop control unit340 also performs ordering of start addresses and generates a signal (inthe figure: next_select) for selecting the next start address (in thefigure: the output ‘next’ of start address unit) expected after thepresent program counter value. Correspondingly, when two or more loopsstart at the same address, the loop with the smallest end address isautomatically selected by the signal next_select. In this way, multipleloops starting at the same address can also be treated without extraoverhead.

At any point in the program (also when the program counter correspondsto an address outside the outermost loop) one start address (in thefigure: next) is selected and compared to the program counter value.Additionally, when the program counter is inside at least one loop, theprogram counter is compared to one end address (in the figure: output ofthe END ADDRESS UNIT) corresponding to the configuration of the currentloop. When an equality is detected at the start address comparator 310,the loop control unit 340 updates the current loop pointer 342, thecurrent loop being specified by the new start address, the end addressresiding in the corresponding end address register, and the iterationcount residing in the shadow register of the corresponding counter.

When an equality is detected at the end address comparator 320, the loopcontrol unit 330 enables the corresponding loop counter (in the figure:count_enable). The loop counter which is already selected by means ofthe signal count_select is then decremented and compared to 0. If thecounter value is 0, the loop control unit updates the current looppointer (the program goes out of the current loop), the program counteris incremented and the program execution continues as described abovewith the new value of the current loop. At this point, if the outermostloop corresponding to the loop which has just exited still has moreiterations to go, the loop counter value must be reinitialized to theoriginal value so that the loop can be started again during the nextiteration of the outer loop. For this reason, a check must be includedin the loop control unit for determining whether this is the case. Ifthe check is positive (i.e. the corresponding outermost loop is stillactive), the loop control unit generates a reset_counter signal which(re-)copies from a shadow register in to the loop register the originalnumber of loop iterations of the loop. Such a use of a shadow registeris known from U.S. Pat. No. 6,064,712 FIG. 4 illustrates a loop countercircuit with a shadow register 400. The value stored in the counter 410can be decremented by block 420. A multiplexer can be controlled to loadinto the counter 410 either the decremented value, the value stored inthe shadow register or an input value 440. The signal select 450 isgenerated using signals set_counter, reset_counter and count_enable(shown in FIG. 2), and used to control the multiplexer. When a loopconfiguration instruction is received (set_counter), the number ofiterations specified for the new loop configuration can be loaded viathe input value 440. The other two options are updating the loop fromthe shadow register (reset_counter) and decrementing the loop counter(count_enable), as seen in FIG. 2. If equality is detected with the endaddress but the decremented count value is not zero, the start addressof the corresponding loop (selected by the register_select input of theSTART ADDRESS UNIT 310) is copied into the program counter 380 causingthe loop to be repeated.

The loop control circuit is preferably used in a processor optimized forsignal processing. Such a processor may be a DSP or any other suitableprocessor/micro-controller. The remainder of the description describesusing the circuit in a highly powerful scalar/vector processor. Thescalar/vector processor is mainly used for regular, “heavy/duty”processing, in particular the processing of inner-loops. The vastmajority of all signal processing will be executed by the vector sectionof the scalar/vector processor. The operation of the regular scalaroperations can be optimized by tightly integrating scalar and vectorprocessing in one processor. A separate micro-controller or DSP 130 maybe used to perform the irregular tasks and, preferably, controls thescalar/vector processor as well.

FIG. 5 shows the main structure of the processor in which the loopcontrol circuit according to the invention may be used. The processorincludes a pipelined vector processing section 510. To support theoperation of the vector section, the scalar/vector processor includes ascalar processing section 520 arranged to operate in parallel to thevector section. Preferably, the scalar processing section is alsopipelined. To support the operation of the vector section, at least onefunctional unit of the vector section also provides the functionality ofthe corresponding part of the scalar section. For example, the vectorsection of a shift functional unit may functionally shift a vector,where a scalar component is supplied by (or delivered to) the scalarsection of the shift functional unit. As such, the shift functional unitcovers both the vector and the scalar section. Therefore, at least somefunctional units not only have a vector section but also a scalarsection, where the vector section and scalar section can co-operate byexchanging scalar data. The vector section of a functional unit providesthe raw processing power, where the corresponding scalar section (i.e.the scalar section of the same functional unit) supports the operationof the vector section by supplying and/or consuming scalar data. Thevector data for the vector sections are supplied via a vector pipeline.

In the preferred embodiment of FIG. 5, the scalar/vector processorincludes the following seven specialized functional units.

Instruction Distribution Unit (idu 550). The idu contains the programmemory 552, reads successive vliw instructions and distributes the 7segments of each instruction to the 7 functional units. Preferably, itcontains the loop unit that supports zero-overhead looping according tothe invention.

Vector Memory Unit (vmu 560). The vmu contains the vector memory (notshown in FIG. 5).

The Code-Generation Unit (cgu 562). The cgu is specialized infinite-field arithmetic, for example for generating vectors of cdma codechips as well as related functions, such as channel coding and CRC.

ALU-MAC Unit (amu 564). The amu is specialized in regular integer andfixed-point arithmetic.

ShuFfle Unit (sfu 566). The sfu can rearrange elements of a vectoraccording to a specified shuffle pattern.

Shift-Left Unit (slu 568). The slu can shift the elements of the vectorby a unit, such as a word, a double word or a quad word to the left. Theproduced scalar is offered to its scalar section.

Shift-Right Unit (sru 570). The sru is similar to the slu, but shifts tothe right. In addition it has the capability to merge consecutiveresults from intra-vector operations on the amu.

As indicated above, many different ways may be used to indicate a startand end of a loop. In a preferred embodiment, a start address and endaddress may be specified using respective 16-bit addresses. The loopcounter maybe specified also using 16 bits. Consequently, 48 bits arerequired for specifying parameters of a loop initialization instruction.Assuming that a maximum of three loops can be specified, a further twobits are required for indicating the loop, giving a total of 50 bits.Additionally, bits are required for identifying the loop initializationinstruction among the possible instructions. If the instruction widthallows, advantageously the loop initialization instruction includes aplurality of fields for initializing loop information of a plurality ofloops in one operation. Particularly if the loop control circuit is usedin a VLIW (Very Large Instruction Word) processor, such as for exampleshown in FIG. 5, more than one loop can be configured in oneinstruction. For the VLIW processor of FIG. 5, preferably 128 bit wideinstructions are used. The instruction may be structured such that onebit is used to distinguish between a regular VLIW instruction (to beexecuted by the execution units) and an IDU instruction. An IDUinstruction may use two bits to distinguish between four IDUinstructions (being call, return, loop, or end-of-program). Using, asdescribed above, an instruction memory with an address width of 16 bit,an 11-bit loop counters, 2 bits for identifying a loop, it is possibleto configure two loops in one instruction. The fields of the instructioncan then be as indicated in table 4. The second column indicates thefield width. TABLE 4 <IDU instruction, VLIW instruction> 1 bit <IDUcommand> 2 bits <loop number1> 2 bits <loop count 1> 16 bits<start_address1> 16 bits <end_address1> 16 bits <loop number2> 2 bits<loop count 2> 16 bits <start_address2> 16 bits <end_address2> 16 bitsIt will be appreciated that the various ways shown for initializing aloop may be used in combination with techniques for compacting code(e.g. by compressing instructions). To clarify the principles of theinvention to such compaction has been shown.

It should be noted that the above-mentioned embodiments illustraterather than limit the invention, and that those skilled in the art willbe able to design many alternative embodiments without departing fromthe scope of the appended claims. In the claims, any reference signsplaced between parentheses shall not be construed as limiting the claim.The words “comprising” and “including” do not exclude the presence ofother elements or steps than those listed in a claim.

1. A data processor for executing instructions stored in an instructionmemory and which are specified by a program counter; the processorincluding: an operation execution unit for executing instructionsindicated by the program counter; and a loop control circuit operativeto: store respective associated loop information for a plurality ofinstruction loops; the loop information for an instruction loopincluding at least an indication of an end of the loop and a loop countfor indicating a number of times the loop should be executed; detectthat one of the loops needs to be executed and in response to saiddetection, load the loop information for the corresponding loop, andcontrol the program counter to execute the corresponding loop accordingto the loaded loop information; initialize the loop information inresponse to a loop initialization instruction, where the initializationinstruction is issued prior to and independent of a start of the loopinitialized by the loop information.
 2. A data processor as claimed inclaim 1, wherein the loop control circuit is operative to execute aplurality of the instruction loops in a nested form, wherein an innerloop is initialized before starting execution of an immediatelysurrounding loop.
 3. A data processor as claimed in claim 1, whereineach instruction for the operation execution unit includes a loop startfield enabling to indicate that the instruction is a first instructionof a sequence of instructions forming an instruction loop to be executedby the operation execution unit.
 4. A data processor as claimed in claim3, wherein the loop control circuit is operative, in response todetecting that the loop start field indicates a start of an instructionloop, to store an indication of a start address of the loop in the loopinformation associated with the loop.
 5. A data processor as claimed inclaim 2, wherein the loop information is stored according to asequential nesting level of the loop, where for a respective one of thenesting levels at most one loop can be specified at each moment in time;the loop control circuit being operative to store a current nestinglevel of instructions being executed; and update the nesting level inresponse to: detecting a start of a loop by checking the loop startfield; and detecting an end of a loop by comparing the program counterto the indication of the end of the loop stored for the loop.
 6. A dataprocessor as claimed in claim 3, wherein the loop start field enables toindicate which one of a plurality of specifiable loops needs to bestarted.
 7. A data processor as claimed in claim 1, wherein the loopinformation includes an indication of a beginning of the loop.
 8. A dataprocessor as claimed in claim 7, wherein the loop control circuit isoperative to detect a start of a loop by comparing the program counterto the indication of a beginning of a loop stored in the loopinformation.
 9. A data processor as claimed in any claim 1, wherein theloop initialization instruction includes a plurality of fields forinitializing loop information of a plurality of loops in one operation.10. A loop control circuit as claimed in claim
 1. 11. A method ofcausing a processor to execute instruction loops specified by a programcounter; the method including: storing respective associated loopinformation for a plurality of instruction loops prior to andindependent of a start of the loop; the loop information for aninstruction loop including at least an indication of an end of the loopand a loop count; and detecting that one of the loops needs to beexecuted and in response to said detection, loading the information forthe corresponding loop, and controlling the program counter to executethe corresponding loop according to the loaded loop information.
 12. Amethod as claimed in claim 11, wherein a plurality of the instructionloops can be executed in a nested form, and the method includes storingloop information for an inner loop prior to starting execution of animmediately surrounding loop.
 13. A computer program product operativeto cause a processor to perform the steps of claim 11.